Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI adjustments #3535

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

CI adjustments #3535

wants to merge 2 commits into from

Conversation

apostasie
Copy link
Contributor

@apostasie apostasie commented Oct 14, 2024

NOTE: this is pending #3572, #3573, #3574, #3575

This is the next part of the test rework effort.

The most important change here

Is that we will now have two separate steps for every integration test run:

  • without retry (this is new) - running all tests that are deemed safe ONCE, and hard failing if one of them fails
  • with retry (this is the previous behavior) - running any other test

Note that the above split does NOT apply to ipv6, nor kube, tests. These are considered safe.

Right now, the balance is about 286 (rootful, no retry) vs. 500 (with retry).
Hopefully, as we fix and migrate more tests, and address bugs in nerdctl, it will shift.

Note that running legacy tests without retries is currently very painful and does surface a lot of conditions that have been previously unseen / ignored for a long time. As a lot of bugs have been recently fixed with regards to concurrency, underlying issues are also starting to crop up, and these very likely run deep / are involved to fix.
These are chiefly:

This is the reason why we are - conservatively - leaving the pre-existing, legacy tests in the "with retries" bucket.

There is now a helper script, test-integration.sh.

When ran without any argument, if will run both steps in order:

If you want to run ONLY the first step (aka "safe tests"), call:
test-integration.sh -test.only-flaky=false

If you want to run ONLY the second step, call:
test-integration.sh -test.only-flaky=true

That works as well with gotestsum, or go test, evidently:

Note though that the default in that case is -test.only-flaky=false:

go test ./cmd/nerdctl/image
go test ./cmd/nerdctl/image -test.only-flaky
# etc...

Other CI changes

  • cosmetic efforts on test display titles (more compact, more consistent)
  • removal of all nick/retries
  • minor tweaks to the Dockerfile to adapt for new script

@apostasie apostasie changed the title Test rework, part 5 [WIP] Test rework, part 5 Oct 14, 2024
.github/workflows/lint.yml Outdated Show resolved Hide resolved
.github/workflows/test.yml Outdated Show resolved Hide resolved
.github/workflows/test.yml Outdated Show resolved Hide resolved
pkg/netutil/netutil.go Outdated Show resolved Hide resolved
pkg/testutil/testutil.go Outdated Show resolved Hide resolved
pkg/testutil/testutil.go Outdated Show resolved Hide resolved
pkg/testutil/testutil.go Outdated Show resolved Hide resolved
strategy:
fail-fast: false
matrix:
# ubuntu-20.04: cgroup v1, ubuntu-22.04 and later: cgroup v2
include:
- ubuntu: 24.04
containerd: v1.7.23
containerd: v2.0.0-rc.5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was discussed before: we run ipv6 tests only on the latest.

@apostasie
Copy link
Contributor Author

@AkihiroSuda this has been stripped down to the bones and is finally ready.

Obviously, disabling retries on a fraction of tests is still surfacing weird issues - oddly, docker - so, I will keep sending separate PRs until this thing is green, but otherwise, the CI code changes themselves are ready for review.

@@ -44,11 +44,13 @@ jobs:
- name: "Run unit tests"
run: go test -v ./pkg/...
- name: "Run integration tests"
run: docker run -t --rm --privileged test-integration
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split the two steps.

@@ -91,5 +94,6 @@ jobs:
ctrdVersion: ${{ env.CONTAINERD_VERSION }}
run: powershell hack/configure-windows-ci.ps1
- name: "Run integration tests"
# See https://github.com/containerd/nerdctl/blob/main/docs/testing/README.md#about-parallelization
run: go test -p 1 -v ./cmd/nerdctl/...
run: ./hack/test-integration.sh -test.only-flaky=false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split the two steps and use gotestsum for better summary


jobs:
test-unit:
# Supposed to work: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/evaluate-expressions-in-workflows-and-actions#example-returning-a-json-data-type
# Apparently does not
# timeout-minutes: ${{ fromJSON(env.SHORT_TIMEOUT) }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish that worked. Anyone has insight?

max_attempts: 2
retry_on: error
command: docker run -t --rm --privileged test-integration
run: docker run -t --rm --privileged test-integration ./hack/test-integration.sh -test.only-flaky=false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split the two steps.

@@ -129,7 +133,7 @@ jobs:
echo '{"ipv6": true, "fixed-cidr-v6": "2001:db8:1::/64", "experimental": true, "ip6tables": true}' | sudo tee /etc/docker/daemon.json
sudo systemctl restart docker
- name: "Prepare integration test environment"
run: docker build -t test-integration-ipv6 --target test-integration-ipv6 --build-arg UBUNTU_VERSION=${UBUNTU_VERSION} --build-arg CONTAINERD_VERSION=${CONTAINERD_VERSION} .
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the extra stage that has nothing specific. It is just an extra arg.

max_attempts: 2
retry_on: error
command: docker run --network host -t --rm --privileged test-integration-ipv6
run: docker run --network host -t --rm --privileged test-integration ./hack/test-integration.sh -test.only-ipv6
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nick should go

@@ -248,12 +246,13 @@ jobs:
go-version: ${{ matrix.go-version }}
cache: true
check-latest: true
- name: "Cross"
- name: "build"
Copy link
Contributor Author

@apostasie apostasie Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better display. This is not "cross-compilation" it is building for different versions of go.

@@ -787,6 +787,9 @@ func newBase(t *testing.T, ns string, ipv6Compatible bool, kubernetesCompatible
} else if !base.EnableKubernetes && base.KubernetesCompatible {
t.Skip("runner skips Kubernetes compatible tests in the non-Kubernetes environment")
}
if !GetFlakyEnvironment() && !GetEnableKubernetes() && !GetEnableIPv6() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that keep the legacy tests into the "retry" part.

@apostasie
Copy link
Contributor Author

Oh docker, I...

@apostasie apostasie mentioned this pull request Oct 19, 2024
@apostasie apostasie changed the title Part 5: CI adjustments CI adjustments Oct 19, 2024
@apostasie
Copy link
Contributor Author

That was quite a trek.

@AkihiroSuda @djdongjin at your convenience.
(See ticket about CI status / data for more info on where we stand)

fi
done

if [ "$needsudo" != "" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can cause a confusion when WITH_SUDO is set to "0", "false", etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ci e.g., CI failure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants